[Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance by Copilot · Pull Request #4 · lHrHenry233/vllm-ascend-dev

Copilot · 2026-03-11T01:25:05Z

当前问题是提交流程指引不够明确，容易把变更误提到 vllm-project/vllm-ascend 社区仓 PR。此变更将流程统一为“私库/私有 fork 提 PR”，并补充误提社区仓后的处理指引。

问题范围收敛（PR 归属）
- 将审查清单中的要求更新为：PR 必须开在私有仓库（或私有 fork），而不是社区仓 vllm-project/vllm-ascend。
提交流程修正（Quick Start）
- 第 10 步改为“推送到私库/私有 fork”。
- 示例远程地址从 vllm-ascend 调整为 vllm-ascend-dev，避免指向社区主仓语义。
误操作恢复指引
- 新增明确步骤：若误在 vllm-project/vllm-ascend 开 PR，应关闭该 PR，并在私库中重新发起。

示例更新片段：

git remote add myfork https://github.com/YOUR_USERNAME/vllm-ascend-dev.git
git push -u myfork your-branch-name

并在文档中补充：

If you accidentally opened a PR in `vllm-project/vllm-ascend`,
close that PR and reopen it in your private repository.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: lHrHenry233 <99803155+lHrHenry233@users.noreply.github.com>

…(v3.1) - Port upstream _causal_conv1d_fwd_kernel as NPU Triton kernel - Handles initial/final/intermediate conv state in-kernel - Supports APC block boundary state writes - NPU adaptations: removed .cache_modifier, kept debug_barrier - Rewrite causal_conv1d_fn to dispatch to new Triton kernel - Rewrite gdn.py conv1d path: split decode/prefill like upstream - Decode: causal_conv1d_update_npu with block params - Prefill: causal_conv1d_fn with APC params (new kernel) - Fix SSM #6: _build_initial_state only zeros prefill sequences - Fix SSM #7: _write_final_states adds slot >= 0 validation - Fix SSM #8: _scatter_intermediate_states adds unaligned offset - Update all 36 UTs to pass with new num_computed_tokens_all field Alignment status vs upstream #26807: #1 conv1d prefill kernel: FIXED (kernel ported) #3 causal_conv1d_fn params: FIXED (rewritten) #4 intermediate conv state: FIXED (kernel internal) #6 SSM zeroing scope: FIXED #7 _write_final_states guard: FIXED #8 SSM scatter alignment: FIXED #9 causal_conv1d_fn signature: FIXED #2 decode pre-copy: KEEP (NPU needs it) #5 SSM decode index: OK (correct approach) #10 conv layout hardcoded: DEFERRED Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Five logic / hot-path issues in vllm_ascend/ops/triton/mamba/causal_conv1d.py: 1. (#1) Fallback non-APC `causal_conv1d_fn` ignored `has_initial_state[i]` and always seeded the convolution from `conv_states[cache_indices[i]]`. First prefill iterations (no prior context) silently used stale cache state. Now passes `initial_states=None` when `has_initial_state[i]` is False. 2. (#2) `forward_context` lookup at the top of `causal_conv1d_fn` ran on every per-layer APC prefill call, but `num_decodes` is only used by the non-APC fallback path. Moved the lookup inside the fallback branch so the APC hot path does not pay it 28x per prefill on Qwen3.5. 3. (#3) `NULL_BLOCK_ID` fell back to `PAD_SLOT_ID` (-1) when upstream did not export it. vLLM v1 `BlockPool` reserves block_id=0 as the null block and vLLM-Ascend `block_table.fill_(0)` matches that semantics, while PAD_SLOT_ID=-1 only applies to 1D slot/state-index padding. The fallback should be 0 so the `null_block_id != pad_slot_id` normalization actually masks 2D block-table placeholders. Documented the distinction at the constant definition site. 4. (#4) `_normalize_apc_input_layout` did four heuristic shape comparisons to auto-detect dim-major / weight-transposed inputs. This is ambiguous when dim==width or dim==tokens, and shifts layout-fix cost into the prefill hot path. Replaced with an explicit contract: x is [tokens, dim], weight is [dim, width]; fail loud otherwise. Removed the now-unused `input_is_dim_major` plumbing and the trailing `out.transpose(0,1). contiguous()` in `_causal_conv1d_fwd_npu`. 5. (#5) `_uses_apc_prefill_path` selected APC when ANY of five tensors was non-None, masking caller bugs (e.g. block_idx_* provided without a 2D block table). Now selects on cache_indices.dim()==2 and raises RuntimeError if APC auxiliary tensors are passed without a 2D block table, matching the documented dispatcher contract. Also dropped the redundant `conv_states.shape[-2] != dim and shape[-1] == dim: transpose` probe in the dispatcher; the caller already passes the standard `[num_cache_lines, dim, state_len]` layout, and silent re-permutation hides upstream contract violations. No behaviour change on the production caller path (gdn._run_all_mode_non_spec_conv1d), verified by tracing layout invariants. The fixes harden the dispatcher against future caller refactors and remove dead/wrong code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: lHrHenry233 <2381623149@qq.com>

Initial plan

127bc2e

Copilot AI assigned Copilot and lHrHenry233 Mar 11, 2026

Copilot started work on behalf of lHrHenry233 March 11, 2026 01:25 View session

docs: clarify private-repo PR workflow in AGENTS guide

374ec3f

Co-authored-by: lHrHenry233 <99803155+lHrHenry233@users.noreply.github.com>

lHrHenry233 closed this Mar 11, 2026

Copilot AI changed the title ~~[WIP] Remove commit from vllm ascend community PR~~ [Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance Mar 11, 2026

Copilot finished work on behalf of lHrHenry233 March 11, 2026 01:28

lHrHenry233 deleted the copilot/remove-pr-from-vllm-ascend branch April 1, 2026 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance#4

[Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance#4
Copilot wants to merge 2 commits into
mainfrom
copilot/remove-pr-from-vllm-ascend

Copilot AI commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 11, 2026 •

edited

Loading